Corpus-Based Extension of Semantic Lexicons in Large Scale

نویسندگان

  • Dimitrios Kokkinakis
  • Maria Toporowska Gronostaj
  • Karin Warmenius
چکیده

During recent years there has been an increased interest to acquire or extend, on a large-scale, highquality semantic lexicons. The methodology is usually corpus-driven. It is based on the (re-)use of machine readable resources of various types, and the application of cost effective ways to eliminate the acquistion bottleneck, i.e. derivational morphology, customization of off-the-shelf resources, statistical techniques and shallow parsing. This paper investigates how, and to what extent the flexibility and robustness of a partial parser can be utilized to fully automatically achieve this goal. Our work is based on the observation that members of a semantic group are often surrounded by other members of the same group in text. Given a few category members we use parsed corpora to collect surrounding contexts and try to identify other words that also belong to the same group.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages

The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resource...

متن کامل

Automatic Extension of Feature-based Semantic Lexicons via Contextual Attributes

We describe how a feature-based semantic lexicon can be automatically extended using large, unstructured text corpora. Experiments are carried out using the lexicon HaGenLex and the Wortschatz corpus. The semantic classes of nouns are determined via the adjectives that modify them. It turns out to be reasonable to combine several classifiers for single attributes into one for complex semantic c...

متن کامل

Harmonised large-scale syntactic/semantic lexicons: a European multilingual infrastructure

The paper aims at providing an overview of the situation of Language Resources (LR) in Europe, in particular as emerging from a few European projects regarding the construction of large-scale harmonised resources to be used for many applicative purpose, also of multilingual nature. An important research aspect of the projects is given by the very fact that the large enterprise described is, at ...

متن کامل

Generating Semantic Orientation Lexicon using Large Data and Thesaurus

We propose a novel method to construct semantic orientation lexicons using large data and a thesaurus. To deal with large data, we use Count-Min sketch to store the approximate counts of all word pairs in a bounded space of 8GB. We use a thesaurus (like Roget) to constrain near-synonymous words to have the same polarity. This framework can easily scale to any language with a thesaurus and a unz...

متن کامل

AnCora-Verb: A Lexical Resource for the Semantic Annotation of Corpora

In this paper we present two large-scale verbal lexicons, AnCora-Verb-Ca for Catalan and AnCora-Verb-Es for Spanish, which are the basis for the semantic annotation with arguments and thematic roles of AnCora corpora. In AnCora-Verb lexicons, the mapping between syntactic functions, arguments and thematic roles of each verbal predicate it is established taking into account the verbal semantic c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001